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16 


Independence 


16.1 


Definitions 




Suppose that we flip two fair coins simultaneously on opposite sides of a room. 




Intuitively, the way one coin lands does not affect the way the other coin lands. 




The mathematical concept that captures this intuition is called independence: 




Definition 16.1.1. Events A and B are independent if Pr[B] = or if 




Pr[y4 1 B\ = Pt[A]. (16.1) 




In other words, A and B are independent if knowing that B happens does not al- 




ter the probability that A happens, as is the case with flipping two coins on opposite 




sides of a room. 




16.1.1 Potential Pitfall 




Students sometimes get the idea that disjoint events are independent. The opposite 




is true: if A r\ B — 0, then knowing that A happens means you know that B 




does not happen. So disjoint events are never independent — unless one of them has 




probability zero. 




16.1.2 Alternative Formulation 




Sometimes it is useful to express independence in an alternate form: 




ineorem 16.1.2. A and B are independent if ana only if 




Pr[A nB] = Pr[A] • Pr[B]. (16.2) 




Proof. There are two cases to consider depending on whether or not Pr[6] = 0. 




Case 1 (Pr[5] = 0): If Pr[5] = 0, ^ and B are independent by Definition 16.1.1. 




In addition. Equation 16.2 holds since both sides are 0. Hence, the theorem 




is true in this case. 




Case 2 (Pr[S] > 0): By Definition 15.1.1, 




Pr[^n B] =Pr[^ 1 B]Pr[B]. 
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So Equation 16.2 holds if 

Pr[^ I B] = Pr[^], 

which, by Definition 16.1.1, is true iff A and B are independent. Hence, the 
theorem is true in this case as well. ■ 



16.2 Independence Is an Assumption 

Generally, independence is something that you assume in modehng a phenomenon. 
For example, consider the experiment of flipping two fair coins. Let A be the event 
that the first coin comes up heads, and let B be the event that the second coin is 
heads. If we assume that A and B are independent, then the probability that both 
coins come up heads is: 

Vr[A r\B]= Vr[A\ ■ Pr[S] = i ■ i = i. 

In this example, the assumption of independence is reasonable. The result of one 
coin toss should have negligible impact on the outcome of the other coin toss. And 
if we were to repeat the experiment many times, we would be likely to have ^ n S 
about 1/4 of the time. 

There are, of course, many examples of events where assuming independence is 
/2o? justified. For example, let C be the event that tomorrow is cloudy and R be the 
event that tomorrow is rainy. Perhaps Pr[C] = l/5andPr[i?] = 1/10 in Boston. 
If these events were independent, then we could conclude that the probability of a 
rainy, cloudy day was quite small: 

Pr[i? n C] = Pr[i?] .Pr[C] = 1.1 = 1. 

Unfortunately, these events are definitely not independent; in particular, every rainy 
day is cloudy. Thus, the probability of a rainy, cloudy day is actually 1/10. 

Deciding when to assume that events are independent is a tricky business. In 
practice, there are strong motivations to assume independence since many useful 
formulas (such as Equation 16.2) only hold if the events are independent. But you 
need to be careful lest you end up deriving false conclusions. We'll see several 
famous examples where (false) assumptions of independence led to trouble over 
the next several chapters. This problem gets even trickier when there are more than 
two events in play. 
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16.3. Mutual Independence 



16.3 Mutual Independence 



16.3.1 Definition 

We have defined what it means for two events to be independent. What if there are 
more than two events? For example, how can we say that the flips of n coins are all 
independent of one another? 

Events Ei, .... En are said to be mutually independent if and only if the prob- 
ability of any event Ei is unaffected by knowledge of the other events. More for- 
mally: 

Definition 16.3.1. A set of events E\ , E2, . . . , Efi,is mutually independent if V/ e 
[1,«] and V5 c [1,«] - {/}, either 



Pr 



= or Pr[£,] = Pr 



Ei I n 



In other words, no matter which other events are known to occur, the probability 
that Ei occurs is unchanged for any / . 

For example, if we toss 100 fair coins at different times, we might reasonably 
assume that the tosses are mutually independent since the probability that the ith 
coin is heads should be 1/2, no matter which other coin tosses came out heads. 

16.3.2 Alternative Formulation 

Just as Theorem 16.1.2 provided an alternative definition of independence for two 
events, there is an alternative definition for mutual independence. 

Theorem 16.3.2. A set of events Ei, E2, ■ ■ . , E„ is mutually independent iff^S C 
[1,4 



Pr 



The proof of Theorem 16.3.2 uses induction and reasoning similar to the proof 
of Theorem 16.1.2. We will not include the details here. 

Theorem 16.3.2 says that E\,E2, . . . ,E„ wee mutually independent if and only 
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if all of the following equations hold for all distinct i, j ,k, and /: 

Vv[Ei^Ej]=?v[Ei]-?v[Ej] 
Pr[£,- n Ej n Ek] = Pr[£i] • Pr[£,] • Pr[£fc] 
Pr[£i n n £fc n = Pr[£,] • Pr[£,] • Pr[£fc] • Pr[£/] 

Pr[£i n =Pr[£i]---Pr[£„]. 

For example, if we toss n fair coins, the tosses are mutually independent iff for 
all m e [1, n] and every subset of m coins, the probability that every coin in the 
subset comes up heads is 2~"^. 

16.3.3 DNA Testing 

Assumptions about independence are routinely made in practice. Frequently, such 
assumptions are quite reasonable. Sometimes, however, the reasonableness of an 
independence assumption is not so clear, and the consequences of a faulty assump- 
tion can be severe. 

For example, consider the following testimony from the O. J. Simpson murder 
trial on May 15, 1995: 

Mr. Clarke: When you make these estimations of frequency — and I believe you 
touched a little bit on a concept called independence? 

Dr. Cotton: Yes, I did. 

Mr. Clarke: And what is that again? 

Dr. Cotton: It means whether or not you inherit one allele that you have is not — 
does not affect the second allele that you might get. That is, if you inherit 
a band at 5,000 base pairs, that doesn't mean you'll automatically or with 
some probability inherit one at 6,000. What you inherit from one parent is 
what you inherit from the other. 

Mr. Clarke: Why is that important? 

Dr. Cotton: Mathematically that's important because if that were not the case, it 
would be improper to multiply the frequencies between the different genetic 
locations. 

Mr. Clarke: How do you — well, first of all, are these markers independent that 
you've described in your testing in this case? 
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Presumably, this dialogue was as confusing to you as it was for the jury. Es- 
sentially, the jury was told that genetic markers in blood found at the crime scene 
matched Simpson's. Furthermore, they were told that the probability that the mark- 
ers would be found in a randomly-selected person was at most 1 in 170 million. 
This astronomical figure was derived from statistics such as: 

• 1 person in 100 has marker A. 

• 1 person in 50 marker B. 

• 1 person in 40 has marker C. 

• 1 person in 5 has marker D . 

• 1 person in 170 has marker E. 

Then these numbers were multiplied to give the probability that a randomly-selected 
person would have all five markers: 

Pr[^ ^B^C^D^E] = ?v[A] ■ Pr[S] • Pr[C] • Pr[Z)] • Pr[£'] 

_ 1 1111 
~ Too ' 50 ' 40 ' 5 ' 170 
_ 1 
~ 170,000,000' 

The defense pointed out that this assumes that the markers appear mutually inde- 
pendently. Furthermore, all the statistics were based on just a few hundred blood 

samples. 

After the trial, the jury was widely mocked for failing to "understand" the DNA 
evidence. If you were a juror, would you accept the 1 in 170 million calculation? 



16.4 Pairwise Independence 

The definition of mutual independence seems awfully complicated — there are so 
many subsets of events to consider! Here's an example that illustrates the subtlety 
of independence when more than two events are involved. Suppose that we flip 
three fair, mutually-independent coins. Define the following events: 

• ^1 is the event that coin 1 matches coin 2. 

• A2 is the event that coin 2 matches coin 3. 
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• ^3 is the event that coin 3 matches coin 1. 

Are Ai, A2, A3 mutually independent? 
The sample space for this experiment is: 

{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT). 

Every outcome has probability (1/2)^ = 1/8 by our assumption that the coins are 
mutually independent. 

To see if events A\, A2, and At, are mutually independent, we must check a 
sequence of equalities. It will be helpful first to compute the probability of each 
event Ai : 

Pr[^i] = Pt[HHH] + Pt[HHT] + Pr[TTH] + Pr[TTT] 
1111 
=8+8+8+8 

_ 1 

~ 2' 

By symmetry, Pr[^2] = Pr[>43] = 1/2 as well. Now we can begin checking all the 
equalities required for mutual independence in Theorem 16.3.2: 

Pr[Ai n A2] = Pr[HHH] + Pr[TTT] 
1 1 
= 8 + 8 

_ 1 

~ 4 
_ 1 1 
~ 2 ' 2 

= Pr[^i]Pr[^2]. 

By symmetry, Pr[^i n ^3] = Pr[^i] • PriA^] and Pr[^2 n ^3] = Pr[^2] • Pr[^3] 
must hold also. Finally, we must check one last condition: 

Pr[^i n y42 n A3] = Pr[HHH] + Pr[rrr] 
1 1 

= 8 + 8 

_ 1 

~ 4 

^Pr[^i]Pr[^2]PrM3] = l- 
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The three events A\, A2, and Aj are not mutually independent even though any 
two of them are independent! This not-quite mutual independence seems weird at 
first, but it happens. It even generalizes: 

Definition 16.4.1. A set Ai, A2, . . . , of events is k-way independent iff every set 
of k of these events is mutually independent. The set is pairwise independent iff it 
is 2-way independent. 

So the sets Ai, A2, As above are pairwise independent, but not mutually inde- 
pendent. Pairwise independence is a much weaker property than mutual indepen- 
dence. 

For example, suppose that the prosecutors in the O. J. Simpson trial were wrong 
and markers A, B, C, D, and E appear only pairwise independently. Then the 
probability that a randomly-selected person has all five markers is no more than: 

Pr[^ nBriCnDnE]< Pr[A n E] 

= Pt[A] ■ Pt[E] 

1 1 
" Too ' 170 

_ 1 

~ 17,000' 

The first Hne uses the fact that ^nBnCnDn£'isa subset of ^ n (We picked 
out the A and E markers because they're the rarest.) We use pairwise independence 
on the second line. Now the probability of a random match is 1 in 17,000 — a far cry 
from 1 in 170 million! And this is the strongest conclusion we can reach assuming 
only pairwise independence. 

On the other hand, the 1 in 17,000 bound that we get by assuming pairwise 
independence is a lot better than the bound that we would have if there were no 
independence at all. For example, if the markers are dependent, then it is possible 
that 

everyone with marker E has marker A, 
everyone with marker A has marker B, 
everyone with marker B has marker C , and 
everyone with marker C has marker D. 

In such a scenario, the probability of a match is 

Pt[E] = 1/170. 
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So a stronger independence assumption leads to a smaller bound on the prob- 
ability of a match. The trick is to figure out what independence assumption is 
reasonable. Assuming that the markers are mutually independent may well not be 
reasonable unless you have examined hundreds of millions of blood samples. Oth- 
erwise, how would you know that marker D does not show up more frequently 
whenever the other four markers are simultaneously present? 

We will conclude our discussion of independence with a useful, and somewhat 
famous, example known as the Birthday Paradox. 



16.5 The Birthday Paradox 

Suppose that there are 100 students in a class. What is the probability that some 
birthday is shared by two people? Comparing 100 students to the 365 possible 
birthdays, you might guess the probability lies somewhere around 1/3 — but you'd 
be wrong: the probability that there will be two people in the class with matching 
birthdays is actually 0.999999692 .... In other words, the probability that all 100 
birthdays are different is less than 1 in 3,000,000. 

Why is this probability so small? The answer involves a phenomenon known as 
the Birthday Paradox (or the Birthday Principle), which is surprisingly important 
in computer science, as we'll see later. 

Before delving into the analysis, we'll need to make some modeling assump- 
tions: 

• For each student, all possible birthdays are equally likely. The idea under- 
lying this assumption is that each student's birthday is determined by a ran- 
dom process involving parents, fate, and, um, some issues that we discussed 
earlier in the context of graph theory. The assumption is not completely ac- 
curate, however; a disproportionate number of babies are born in August and 
September, for example. 

• Birthdays are mutually independent. This isn't perfectly accurate either. For 
example, if there are twins in the class, then their birthdays are surely not 
independent. 

We'll stick with these assumptions, despite their limitations. Part of the reason is 
to simplify the analysis. But the bigger reason is that our conclusions will apply to 
many situations in computer science where twins, leap days, and romantic holidays 
are not considerations. After all, whether or not two items collide in a hash table 
really has nothing to do with human reproductive preferences. Also, in pursuit of 



8 



"mcs-ftr' — 2010/9/8 — 0:40 — page 439 — #445 



16.5. The Birthday Paradox 

generality, let's switch from specific numbers to variables. Let m be the number of 
people in the room, and let A'^ be the number of days in a year. 

We can solve this problem using the standard four-step method. However, a tree 
diagram will be of little value because the sample space is so enormous. This time 
we'll have to proceed without the visual aid! 

Step 1: Find the Sample Space 

Let's number the people in the room from 1 to m. An outcome of the experiment 
is a sequence {b\, . . . ,bm) where bi is the birthday of the z th person. The sample 
space is the set of all such sequences: 

S = {{bi,...,b,n)\bie{\,...N}}. 

Step 2: Define Events of Interest 

Our goal is to determine the probability of the event A in which some pair of people 
have the same birthday. This event is a little awkward to study directly, however. 
So we'll use a common trick, which is to analyze the complementary event A, in 
which all m people have different birthdays: 

A — {{bi, . . . , bm) e 5 I all bi are distinct}. 

If we can compute Pr[^], then we can compute what really want, Pr[^], using the 
identity 

Pr[yl] +Pr[l] = 1. 
Step 3: Assign Outcome Probabilities 

We need to compute the probability that m people have a particular combination of 

birthdays (bi, . . . , bm)- There are A'^ possible birthdays and all of them are equally 
likely for each student. Therefore, the probability that the zth person was bom on 
day bi is l/N. Since we're assuming that birthdays are mutually independent, we 
can multiply probabihties. Therefore, the probability that the first person was bom 
on day bi, the second on b2, and so forth is (l/N)'". This is the probability of 
every outcome in the sample space, which means that the sample space is uniform. 
That's good news, because, as we have seen, it means that the analysis will be 
simpler. 

Step 4: Compute Event Probabilities 

We're interested in the probability of the event A in which everyone has a different 
birthday: 

A = {{b\, . . . , bn) I all bi are distinct }. 
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This is a gigantic set. In fact, there are choices for bi, N — I choices for b2, and 
so forth. Therefore, by the Generalized Product Rule, 

1^1 = rM^' ., =NiN-l){N-2)---{N-m + l). 
{N — my. 

Since the sample space is uniform, we can conclude that 

r-, Ml A/! 

PrU = — = . (16.3) 

We're done! 

Or are we? While correct, it would certainly be nicer to have a closed-form ex- 
pression for Equation 16.3. That means finding an approximation for A'^! and (A'^ — 
m)\. But this is what we learned how to do in Section 9.6. In fact, since A'^ 
and N — m are each at least 100, we know from Corollary 9.6.2 that 



V27rA/ and y/lniN -m) ^ 



N-m 



are excellent approximations (accurate to within .09%) of N\ and (A'^ — m)\, re- 
spectively. Plugging these values into Equation 16.3 means that (to within .2%)^ 



V2^(f) 

= V 

N'^y/2n{N -m)(^] 



l\f —m gmln(N)g(N-m)ln(N-m)-(N-m) 



^ ^(N-m)HN)-(N-m)HN-m)-m 



N — m 



N 



N — m 

= e(^->"+h)HN^)-'» . (16.4) 



If there are two terms that can be off by .09%, then the ratio can be off by at most a factor 



of (1.0009)2 < 1.002. 
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We can now evaluate Equation 16.4 for m = 100 and N = 365 to find that the 
probability that all 100 birthdays are different is^ 

3.07 . . . • 10~^. 

We can also plug in other values of m to find the number of people so that the 
probability of a matching birthday will be about 1 /2. In particular, for m = 23 and 
= 365, Equation 16.4 reveals that the probabiUty that all the birthdays differ is 

0.49 So if you are in a room with 23 other people, the probability that some pair 

of people share a birthday will be a little better than 1/2. It is because 23 seems 
hke such a small number of people for a match that the phenomenon is called the 
Birthday Paradox. 

16.5.1 Applications to Hashing 

Hashing is frequently used in computer science to map large strings of data into 
short strings of data. In a typical scenario, you have a set of m items and you would 
hke to assign each item to a number from Ito N where no pair of items is assigned 
to the same number and N is as small as possible. For example, the items might be 
messages, addresses, or variables. The numbers might represent storage locations, 
devices, indices, or digital signatures. 

If two items are assigned to the same number, then a collision is said to occur. 
Collisions are generally bad. For example, collisions can correspond to two vari- 
ables being stored in the same place or two messages being assigned the same dig- 
ital signature. Just imagine if you were doing electronic banking and your digital 
signature for a $10 check were the same as your signature for a $10 milhon dollar 
check. In fact, finding collisions is a common technique in breaking cryptographic 
codes.-' 

In practice, the assignment of a number to an item is done using a hash function 

h:S^[l,N], 

where S is the set of items and m = \S\. Typically, the values of h(S) are assigned 
randomly and are assumed to be equally likely in [1, A^] and mutually independent. 

For efficiency purposes, it is generally desirable to make N as small as necessary 
to accommodate the hashing of m items without colhsions. Ideally, A'^ would be 
only a little larger than m. Unfortunately, this is not possible for random hash 
functions. To see why, let's take a closer look at Equation 16.4. 

^The possible .2% error is so small that it is lost in the . . . after 3.07. 

^Such techniques are often referred to as birthday attacks because of the association of such 
attacks with the Birthday Paradox. 
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By Theorem 9.6.1 and the derivation of Equation 16.4, we know that the proba- 
bihty that there are no colUsions for a random hash function is 

^^(jV-m+i)ln(^)-m (j^.S) 

For any m, we now need to find a value of A'^ for which this expression is at least 1/2. 
That will tell us how big the hash table needs to be in order to have at least a 
50% chance of avoiding collisions. This means that we need to find a value of 
for which 

(''-'"+y'"(iv^)-'"~'"G)- 

To simplify Equation 16.6, we need to get rid of the In {^j^z^^ term. We can do 
this by using the Taylor Series expansion for 



2 3 



\a{\-x) = -x- — - 



to find that"^ 



\N-mJ V 



N - m 



= -In ' 



N 2iV2 3^3 

" /V ^ IPP- ^ JPP ^ "' 



This may not look like a simplification, but stick with us here. 
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Hence, 



m m} 



»T + :;^:7^ + wiTPi "I ]—m 



( 



2 "? 4 

m m m 
I ( m nr' 

2 3 4 \ 

m m m ^ 



-( 



^ 2 (a^ ^ 2A/2 + 3A/3 +■ 

(16.7) 

If N grows faster than m^, then the value in Equation 16.7 tends to and Equa- 
tion 16.6 cannot be satisfied. If A'^ grows more slowly than m^, then the value in 
Equation 16.7 diverges to negative infinity, and, once again. Equation 16.6 cannot 
be satisfied. This suggests that we should focus on the case where A'^ = &{m^), 
when Equation 16.7 simplifies to 

2 

—m 



m nr' 



2N 



and Equation 16.6 becomes 

2 

—m 



2N 

Equation 16.8 is satisfied when 



(16.8) 



2 

m 



In other words, needs to grow quadratically with m in order to avoid collisions. 
This unfortunate fact is known as the Birthday Principle and it limits the efficiency 
of hashing in practice — either A'^ is quadratic in the number of items being hashed 
or you need to be able to deal with collisions. 
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